{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ReasoningAgent - Advanced LLM Reasoning with Multiple Search Strategies\n",
    "\n",
    "## Introduction\n",
    "\n",
    "The `ReasoningAgent` is designed to enhance language models' reasoning capabilities through systematic exploration of thought processes. By implementing the Tree of Thoughts (ToT) framework, it enables LLMs like GPT-4 and Llama to break down complex problems into manageable steps and explore multiple solution paths simultaneously.\n",
    "\n",
    "This notebook demonstrates the key features and capabilities of the `ReasoningAgent`, showing how it can effectively reason about problems.\n",
    "\n",
    "## Search Strategies\n",
    "\n",
    "The `ReasoningAgent` supports multiple search strategies for exploring the reasoning space:\n",
    "\n",
    "### 1. Beam Search (Default)\n",
    "- Maintains the top `k` most promising paths at each step\n",
    "- Efficient for problems with clear evaluation criteria\n",
    "- Configurable beam width to balance exploration vs computation\n",
    "- Special case: DFS mode (beam size = 1) for linear reasoning similar to Chain-of-Thought\n",
    "\n",
    "### 2. Monte Carlo Tree Search (MCTS)\n",
    "- Balances exploration and exploitation using UCT formula\n",
    "- Particularly effective for problems with delayed rewards\n",
    "- Stochastic exploration helps avoid local optima\n",
    "- Configurable number of simulations and exploration constant\n",
    "\n",
    "### 3. Language Agent Tree Search (LATS)\n",
    "- Provides immediate reflection feedback before the next simulation\n",
    "- Helps identify poor reasoning paths early for future improvement\n",
    "- Especially useful for complex multi-step reasoning\n",
    "\n",
    "## Core Components\n",
    "\n",
    "1. **Thinker Agent**: Generates potential next steps in the reasoning process\n",
    "2. **Grader Agent**: Evaluates the quality of each reasoning step\n",
    "3. **Interim Execution**: Option to execute the selected steps, enabling stepwise reasoning.\n",
    "4. **Code Execution**: a child user agent will execute code automatically during reasoning\n",
    "5. **Tree Structure**: Organizes thoughts hierarchically for systematic exploration\n",
    "6. **Visualization Tools**: Built-in Graphviz support for analyzing reasoning paths\n",
    "7. **Logging Features**: Log and save thinking trajectories to finetune the language model\n",
    "8. **Configuration Options**: The agent is highly configurable through a single `reason_config` dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "import random\n",
    "\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "from autogen import AssistantAgent, LLMConfig, UserProxyAgent\n",
    "from autogen.agents.experimental import ReasoningAgent, ThinkNode\n",
    "\n",
    "load_dotenv()\n",
    "# Put your key in the OPENAI_API_KEY environment variable\n",
    "llm_config = LLMConfig(\n",
    "    config_list={\"api_type\": \"openai\", \"model\": \"gpt-5-nano\", \"api_key\": os.getenv(\"OPENAI_API_KEY\")}\n",
    ")\n",
    "\n",
    "question = \"What is the expected maximum dice value if you can roll a 6-sided dice three times?\"\n",
    "random.seed(1)  # setup seed for reproducibility"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write `last_meaningful_msg` summary function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def last_meaningful_msg(sender, recipient, summary_args):\n",
    "    import warnings\n",
    "\n",
    "    if sender == recipient:\n",
    "        return \"TERMINATE\"\n",
    "\n",
    "    summary = \"\"\n",
    "    chat_messages = recipient.chat_messages[sender]\n",
    "\n",
    "    for msg in reversed(chat_messages):\n",
    "        try:\n",
    "            content = msg[\"content\"]\n",
    "            if isinstance(content, str):\n",
    "                summary = content.replace(\"TERMINATE\", \"\")\n",
    "            elif isinstance(content, list):\n",
    "                # Remove the `TERMINATE` word in the content list.\n",
    "                summary = \"\\n\".join(\n",
    "                    x[\"text\"].replace(\"TERMINATE\", \"\") for x in content if isinstance(x, dict) and \"text\" in x\n",
    "                )\n",
    "            if summary.strip().rstrip():\n",
    "                return summary\n",
    "        except (IndexError, AttributeError) as e:\n",
    "            warnings.warn(f\"Cannot extract summary using last_msg: {e}. Using an empty str as summary.\", UserWarning)\n",
    "    return summary"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize a `user_proxy` agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_proxy = UserProxyAgent(\n",
    "    name=\"user_proxy\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    code_execution_config=False,\n",
    "    is_termination_msg=lambda x: True,  # terminate when reasoning agent responds\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chain-of-Thought Reasoning with DFS\n",
    "\n",
    "The simplest form of tree-based reasoning uses depth-first search (DFS) to explore a single path, similar to OpenAI's O1 feature.\n",
    "By setting `method=\"dfs\"` in the reason_config, the agent will:\n",
    "1. Generate one reasoning step at a time\n",
    "2. Follow that single path until reaching a conclusion\n",
    "3. Never explore alternative branches\n",
    "\n",
    "Note: The effectiveness depends on the underlying model's training. Models not specifically trained for step-by-step reasoning\n",
    "may show limited improvement with this approach.\n",
    "\n",
    "Note 2: To enable the execution of each selected step before generating the next step suggestions, pass \n",
    "`\"interim_execution\": True` in reason_config."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reason_agent = ReasoningAgent(\n",
    "    name=\"reason_agent\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"answer math questions\",\n",
    "    reason_config={\"method\": \"dfs\", \"max_depth\": 3},  # Using DFS\n",
    "    silent=False,\n",
    "    # NOTE: it is equivalent to use beam size 1 for O1-style reasoning\n",
    "    # reason_config={\"method\": \"beam_search\", \"beam_size\": 1, \"max_depth\": 3},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ans = user_proxy.initiate_chat(reason_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Beam Search in Tree of Thought\n",
    "\n",
    "Beam Search is a powerful technique used in tree-based reasoning that allows the agent to explore multiple paths simultaneously. By setting `beam_size` greater than 1, the agent can maintain several candidate solutions at each step, evaluating them based on their potential to lead to the best final answer. This method is particularly effective when the solution space is large and complex, as it balances exploration and exploitation, ensuring that promising paths are prioritized while still considering alternative options.\n",
    "\n",
    "In this approach, the agent generates multiple reasoning steps in parallel, allowing it to compare different trajectories and select the most promising ones for further exploration. This can lead to more robust and accurate conclusions, especially in scenarios where intermediate evaluations are critical to the final outcome."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reason_agent = ReasoningAgent(\n",
    "    name=\"reason_agent\",\n",
    "    llm_config=llm_config,\n",
    "    reason_config={\"method\": \"beam_search\", \"beam_size\": 3},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ans = user_proxy.initiate_chat(reason_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that in this case the agent suggests to execute a script. Later, we will see how it can do this internally."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Beam Search with Batch Grading\n",
    "By default, node grading is performed one at a time. While this approach is often sufficient, certain applications benefit from a batched grading approach on each beam expansion. In other words, instead of grading all nodes across the entire search in a single pass, we group each beam's newly expanded nodes into a single batch for grading. This yields:\n",
    "\n",
    "1. **Context-aware evaluation**: Within a single beam iteration, the grader can compare and contrast multiple node expansions at once.\n",
    "2. **Improved efficiency**: Combining multiple evaluations into one request per beam iteration can reduce the total number of LLM calls.\n",
    "\n",
    "To enable batch grading, set `\"batch_grading\": True` in the `reason_config`. By default, `batch_grading` is set to `False`, meaning individual node grading is performed without batching. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reason_agent = ReasoningAgent(\n",
    "    name=\"reason_agent\",\n",
    "    llm_config=llm_config,\n",
    "    reason_config={\"method\": \"beam_search\", \"beam_size\": 3, \"batch_grading\": True},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ans = user_proxy.initiate_chat(reason_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MCTS\n",
    "This section demonstrates how to use Monte Carlo Tree Search (MCTS) with ReasoningAgent for complex reasoning tasks. MCTS provides several advantages over beam search when:\n",
    "\n",
    "1. Ground truth evaluation is available\n",
    "2. LLM-based evaluation is expensive\n",
    "3. You want to generate diverse, high-quality training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mcts_agent = ReasoningAgent(\n",
    "    name=\"mcts_agent\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"answer math questions\",\n",
    "    # setup small depth and simulations for conciseness.\n",
    "    reason_config={\"method\": \"mcts\", \"nsim\": 3, \"max_depth\": 4},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ans = user_proxy.initiate_chat(mcts_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LATS\n",
    "\n",
    "It is important to note that our reasoning agent operates based on \"process\" and lacks direct access to the environment. In contrast, the LATS approach relies on feedback from the environment. To address this, we utilize our existing grader agent to generate pseudo-rewards and provide feedback. The major difference between our LATS implementation and our MCTS implementation is that the LATS approach incorporate the reflection into prompt context before next round of simulation. You can define the agent using the LATS approach as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lats_agent = ReasoningAgent(\n",
    "    name=\"mcts_agent\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"answer math questions\",\n",
    "    # setup small depth and simulations for conciseness.\n",
    "    reason_config={\"method\": \"lats\", \"nsim\": 3, \"max_depth\": 4},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lats_res = user_proxy.initiate_chat(recipient=lats_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use a different Model for Grading \n",
    "\n",
    "To use a different model for grading instead of gpt-5, pass the `grader_llm_config` argument when initializing the `ReasoningAgent`. This ensures that the grading of trajectories is performed using the specified configuration from the `config_list`, separate from the main `llm_config`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "grader_llm_config = LLMConfig(\n",
    "    config_list={\"api_type\": \"openai\", \"model\": \"gpt-5-nano\", \"api_key\": os.getenv(\"OPENAI_API_KEY\")}\n",
    ")\n",
    "\n",
    "writer = AssistantAgent(\n",
    "    name=\"Writer\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"\"\"You are a professional writer, known for your insightful and engaging articles.\n",
    "You transform complex concepts into compelling narratives.\n",
    "You should improve the quality of the content based on the feedback from the user.\n",
    "\"\"\",\n",
    ")\n",
    "reason_agent_for_writer = ReasoningAgent(\n",
    "    name=\"reason_agent\",\n",
    "    llm_config=llm_config,\n",
    "    grader_llm_config=grader_llm_config,\n",
    "    reason_config={\"method\": \"lats\", \"nsim\": 2, \"max_depth\": 3},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = reason_agent._root.to_dict()\n",
    "with open(\"reasoning_tree.json\", \"w\") as f:\n",
    "    json.dump(data, f)\n",
    "\n",
    "# recover the node\n",
    "with open(\"reasoning_tree.json\") as f:\n",
    "    new_node = ThinkNode.from_dict(json.load(f))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sft_data = reason_agent.extract_sft_dataset()\n",
    "rlhf_data = reason_agent.extract_rlhf_preference_dataset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(rlhf_data)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Utilizing Ground Truth to Enhance Training Data Generation\n",
    "\n",
    "Access to ground truth answers allows us to improve the evaluation of reasoning paths. In this section, we will explore:\n",
    "- The process of incorporating ground truth into prompts\n",
    "- The methods by which the agent leverages ground truth for evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ans = user_proxy.initiate_chat(lats_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interim Execution During Reasoning\n",
    "You can enable `interim_execution` by setting it to `True` in `reason_config`. This allows intermediate steps to be executed during the reasoning process, promoting more effective step-by-step thinking and enabling future steps to be informed by the outputs of earlier ones. By default `interim_execution` is `False` which means that the selected steps won't be executed during reasoning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lats_agent = ReasoningAgent(\n",
    "    name=\"mcts_agent\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"answer math questions\",\n",
    "    reason_config={\"method\": \"lats\", \"nsim\": 3, \"max_depth\": 4, \"interim_execution\": True},\n",
    ")\n",
    "\n",
    "ans = user_proxy.initiate_chat(lats_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Code Execution During Reasoning\n",
    "\n",
    "You can setup the parameter `code_execution_config` in reasoning agent to enable code execution during reasoning.\n",
    "By default, `code_execution_config=False`, which means it will not execute code for reasoning. Note that to allow for code execution, `interim_execution` must be set to `True` at `reason_config`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lats_agent = ReasoningAgent(\n",
    "    name=\"mcts_agent\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"answer math questions\",\n",
    "    reason_config={\"method\": \"lats\", \"nsim\": 3, \"max_depth\": 4, \"interim_execution\": True},\n",
    "    code_execution_config={\"use_docker\": False, \"work_dir\": \"mypy_cache\"},\n",
    "    # Enable Code execution. We skip docker here for simplicity\n",
    ")\n",
    "\n",
    "ans = user_proxy.initiate_chat(\n",
    "    lats_agent, message=question + \" Run a python simulation to get the result\", summary_method=last_meaningful_msg\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizing the Reasoning Tree\n",
    "\n",
    "### Installation of Graphviz\n",
    "\n",
    "To visualize the reasoning tree, you need to install Graphviz. Please note that using `pip install` may not be sufficient for all operating systems. In some cases, you might need to manually download and install Graphviz.\n",
    "\n",
    "`pip install graphviz`\n",
    "\n",
    "### To save the visualization as \"tree_of_thoughts.png\", run the following command:\n",
    "```python\n",
    "mcts_agent.visualize_tree()\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Utilizing ReasoningAgent for Nested Chat Interactions\n",
    "\n",
    "In this example, we will explore how the ReasoningAgent can be employed to facilitate nested chat interactions, specifically for writing a blog post about NVIDIA. The agent will engage in a structured dialogue to enhance the quality of the content through iterative feedback and reasoning.\n",
    "\n",
    "### Task: Writing a Blog Post on NVIDIA\n",
    "\n",
    "The goal is to generate a concise yet engaging blog post about NVIDIA. The process involves one turn (for simplicity) of conversation where the agent reflects on the content, reasons about improvements, and incorporates user feedback. You can update the `max_turns` parameter to execute multiple times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "writer = AssistantAgent(\n",
    "    name=\"Writer\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"\"\"You are a professional writer, known for your insightful and engaging articles.\n",
    "You transform complex concepts into compelling narratives.\n",
    "You should improve the quality of the content based on the feedback from the user.\n",
    "\"\"\",\n",
    ")\n",
    "reason_agent_for_writer = ReasoningAgent(\n",
    "    name=\"reason_agent\",\n",
    "    llm_config=llm_config,\n",
    "    reason_config={\"method\": \"lats\", \"nsim\": 2, \"max_depth\": 3},\n",
    ")\n",
    "\n",
    "\n",
    "def reflection_message(recipient, messages, sender, config):\n",
    "    print(\"Reflecting...\", \"yellow\")\n",
    "    return f\"Reflect, Reason and provide critique on the following writing. \\n\\n {recipient.chat_messages_for_summary(sender)[-1]['content']}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_proxy.register_nested_chats(\n",
    "    [\n",
    "        {\n",
    "            \"recipient\": reason_agent_for_writer,\n",
    "            \"message\": reflection_message,\n",
    "            \"summary_method\": \"last_msg\",\n",
    "            \"max_turns\": 1,\n",
    "        }\n",
    "    ],\n",
    "    trigger=writer,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "task = \"\"\"Write a concise but engaging blogpost about Nvidia.\"\"\"\n",
    "res = user_proxy.initiate_chat(recipient=writer, message=task, max_turns=2, summary_method=\"last_msg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(res.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use a different Model for Grading \n",
    "\n",
    "To use a different model for grading instead of gpt-5, pass the `grader_llm_config` argument when initializing the `ReasoningAgent`. This ensures that the grading of trajectories is performed using the specified configuration from the `config_list`, separate from the main `llm_config`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Put your key in the OPENAI_API_KEY environment variable\n",
    "grader_llm_config = LLMConfig(\n",
    "    config_list={\"api_type\": \"openai\", \"model\": \"gpt-5-nano\", \"api_key\": os.getenv(\"OPENAI_API_KEY\")}\n",
    ")\n",
    "\n",
    "writer = AssistantAgent(\n",
    "    name=\"Writer\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"\"\"You are a professional writer, known for your insightful and engaging articles.\n",
    "You transform complex concepts into compelling narratives.\n",
    "You should improve the quality of the content based on the feedback from the user.\n",
    "    \"\"\",\n",
    ")\n",
    "\n",
    "reason_agent_for_writer = ReasoningAgent(\n",
    "    name=\"reason_agent\",\n",
    "    llm_config=llm_config,\n",
    "    grader_llm_config=grader_llm_config,\n",
    "    reason_config={\"method\": \"lats\", \"nsim\": 2, \"max_depth\": 3},\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Save data to future training\n",
    "In this section, we will focus on saving the reasoning agent's decision-making data to help future training. \n",
    "By capturing the structure and content of the reasoning tree, we can create a valuable dataset that can be used \n",
    "to enhance the agent's learning process. This data will allow us to analyze the agent's reasoning patterns, \n",
    "improve its performance, and refine its ability to generate high-quality responses. \n",
    "The saved data can be utilized for various training methodologies, including supervised fine-tuning and \n",
    "reinforcement learning, ultimately contributing to the development of a more robust and effective reasoning agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = reason_agent._root.to_dict()\n",
    "with open(\"reasoning_tree.json\", \"w\") as f:\n",
    "    json.dump(data, f)\n",
    "\n",
    "# recover the node\n",
    "with open(\"reasoning_tree.json\") as f:\n",
    "    new_node = ThinkNode.from_dict(json.load(f))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sft_data = reason_agent.extract_sft_dataset()\n",
    "rlhf_data = reason_agent.extract_rlhf_preference_dataset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(rlhf_data)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Utilizing Ground Truth to Enhance Training Data Generation\n",
    "\n",
    "Access to ground truth answers allows us to improve the evaluation of reasoning paths. In this section, we will explore:\n",
    "- The process of incorporating ground truth into prompts\n",
    "- The methods by which the agent leverages ground truth for evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = \"\"\"What is the expected maximum dice value if you can roll a 6-sided dice three times?\n",
    "\n",
    "GROUND_TRUTH:\n",
    "We define X as the highest outcome among the three rolls.\n",
    "The probability that X is at least m is 1 - \\\\left(\\frac{m-1}{6}\\right)^3 for each m from 1 to 6.\n",
    "Summing these probabilities gives the expectation E(X) = \\\\sum_{m=1}^{6} [1 - (\\frac{m-1}{6})^3].\n",
    "Calculating this sum results in E(X) = 6 - \\frac{225}{216} = \\frac{119}{24}, which approximates to 4.9583.\n",
    "Therefore, the expected maximum value when rolling a six-sided die three times is \\frac{119}{24} or approximately 4.9583.\n",
    "\"\"\"\n",
    "random.seed(1)  # setup seed for reproducibility\n",
    "\n",
    "mcts_agent2 = ReasoningAgent(\n",
    "    name=\"mcts_agent\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"answer math questions\",\n",
    "    # setup small depth and simulations for conciseness.\n",
    "    reason_config={\"method\": \"mcts\", \"nsim\": 3, \"max_depth\": 4},\n",
    ")\n",
    "\n",
    "\n",
    "ans = user_proxy.initiate_chat(mcts_agent2, message=prompt, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forest of Thoughts\n",
    "\n",
    "The concept of a \"Forest of Thoughts\" allows us to leverage bootstrapping techniques to execute the tree of thoughts multiple times, creating a diverse set of answers. After running these independent reasoning processes, we can aggregate them to form our final answer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "forest_agent = ReasoningAgent(\n",
    "    name=\"mcts_agent\",\n",
    "    llm_config=llm_config,\n",
    "    system_message=\"answer math questions\",\n",
    "    # setup small depth and simulations for conciseness.\n",
    "    reason_config={\"method\": \"dfs\", \"max_depth\": 4, \"forest_size\": 3},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ans = user_proxy.initiate_chat(forest_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Scope\n",
    "The effectiveness of a LLM agent on a given task can be significantly enhanced through prompt optimization. To support this for the `ReasoningAgent`, a `scope` parameter can be specified during initialization. This parameter will provide valuable context about the agent’s intended use, the reasoning process it should follow, and any constraints or pitfalls to avoid. This information is incorporated into the agent’s thought process to guide its behavior more effectively.\n",
    "\n",
    "Note: The `scope` differs from the `system_message` in that it informs the agent’s reasoning throughout the entire thinking process, whereas the `system_message` is used solely when generating the final response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scope = \"\"\"You assess ethical risks of AI systems used in services.\n",
    "Begin by identifying stakeholders and their interests.\n",
    "Then, evaluate potential ethical risks (bias, transparency, impact).\n",
    "Finally, suggest mitigation strategies and ethical safeguards\"\"\"\n",
    "\n",
    "reason_agent = ReasoningAgent(\n",
    "    name=\"reason_agent\",\n",
    "    llm_config=llm_config,\n",
    "    reason_config={\"method\": \"dfs\", \"max_depth\": 3},  # Using DFS\n",
    "    silent=False,\n",
    "    scope=scope,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"What are the ethical risks of using AI in healthcare?\"\n",
    "ans = user_proxy.initiate_chat(reason_agent, message=question, summary_method=last_meaningful_msg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ans.summary)"
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "Use ReasoningAgent for o1 style reasoning in Agentic workflows with LLMs using AG2",
   "tags": [
    "reasoning agent",
    "tree of thoughts"
   ]
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
